Skip to content

fix(cua): correct Gemini key/drag arg names + safer session cleanup#155

Merged
masnwilliams merged 1 commit intohypeship/unified-cua-templatefrom
hypeship/fix-gemini-args-and-session
May 4, 2026
Merged

fix(cua): correct Gemini key/drag arg names + safer session cleanup#155
masnwilliams merged 1 commit intohypeship/unified-cua-templatefrom
hypeship/fix-gemini-args-and-session

Conversation

@masnwilliams
Copy link
Copy Markdown
Contributor

@masnwilliams masnwilliams commented May 2, 2026

Summary

Stacks on top of the unified CUA template branch. Addresses three of the outstanding Cursor Bugbot findings on PR #143.

Gemini argument-name mismatches

Both bugs cause the affected actions to be silently/structurally broken — fixed in TS and Python in lockstep with the standalone gemini-computer-use templates.

  • key_combination read from args.key_combination → now reads args.keys. Gemini's schema sends the combo in keys (a single +-joined string), so the previous code always saw an empty combo and silently dropped every shortcut.
  • drag_and_drop read from start_x / start_y / end_x / end_y → now reads x / y / destination_x / destination_y. Gemini's schema uses the latter; the previous names always resolved to 0, so every drag went (0,0) -> (0,0).

Session cleanup ordering (TS + Python)

session.stop() previously placed the state-reset (_sessionId = null, …) after the try / finally that performs replay-stop + browser-delete. If stopReplay() threw, the finally deleted the browser, the exception then propagated past the cleanup lines, and a follow-up stop() from the caller's error path would attempt to delete the already-destroyed session — masking the original error.

Moved the state-reset into the finally, so a second stop() is a safe no-op regardless of how the first attempt unwound.

Bugbot findings I checked but did not change

These are flagged on the PR but already fixed in the current branch (probably resolved between scans):

  • Anthropic system prompt date freezing — already a getSystemPrompt() function that reads new Date() per task.
  • Python session.py using browsers.delete — already uses browsers.delete_by_id.
  • Python OpenAI provider missing screenshot action — already returns [] for screenshot.
  • Python Gemini dropping screenshots — already appends an inline_data Part per response when result["screenshot"] is set.
  • PREDEFINED_ACTIONS unused — leftover constant; harmless. Left as-is to keep the diff focused.

Test plan

  • go build ./...
  • go test ./pkg/create/...
  • Deploy CUA template with Gemini provider, ask it to perform a drag, then a keyboard shortcut (e.g. ctrl+a, ctrl+l) — confirm both succeed
  • Force a replay-stop failure (e.g. invalid replay state) and confirm session.stop() can be called twice without crashing

Note

Medium Risk
Moderate risk: changes input parsing for Gemini actions and alters session teardown ordering, which could affect browser lifecycle and replay cleanup if edge cases were missed.

Overview
Fixes Gemini CUA action argument mismatches in both TS and Python so key_combination reads the keys "+"-joined string and drag_and_drop uses x/y and destination_x/destination_y.

Makes KernelBrowserSession.stop() safer by resetting session/replay state inside the finally block and deleting the browser using a captured sessionId, preventing double-deletes when replay stopping throws.

Reviewed by Cursor Bugbot for commit 1c6b554. Bugbot is set up for automated code reviews on this repo. Configure here.

- Gemini key_combination reads 'keys' (was: 'key_combination'), matching
  the standalone gemini-computer-use template and Gemini's actual schema.
  Without this, all keyboard shortcuts silently fail.
- Gemini drag_and_drop reads x/y + destination_x/destination_y (was:
  start_x/start_y/end_x/end_y), matching Gemini's schema. Without this,
  every drag resolves to (0,0)->(0,0).
- Session.stop() resets state inside finally so a thrown replay/delete
  error doesn't leave _sessionId set, which would otherwise cause a
  follow-up stop() in the caller's error path to delete the already-
  destroyed browser and mask the original error.

Applied symmetrically in TS and Python.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@masnwilliams masnwilliams marked this pull request as ready for review May 4, 2026 16:37
@firetiger-agent
Copy link
Copy Markdown

Firetiger deploy monitoring skipped

This PR didn't match the auto-monitor filter configured on your GitHub connection:

Any PR that changes the kernel API. Monitor changes to API endpoints (packages/api/cmd/api/) and Temporal workflows (packages/api/lib/temporal) in the kernel repo

Reason: PR modifies CUA (Cursor UI Agent) template code, not kernel API endpoints or Temporal workflows in packages/api/

To monitor this PR anyway, reply with @firetiger monitor this.

@masnwilliams masnwilliams merged commit f588e58 into hypeship/unified-cua-template May 4, 2026
3 checks passed
@masnwilliams masnwilliams deleted the hypeship/fix-gemini-args-and-session branch May 4, 2026 16:38
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 1c6b554. Configure here.

const info = this.info;

if (this._sessionId) {
const sessionId = this._sessionId;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second stop() call throws instead of being no-op

Medium Severity

The stop() method calls this.info (TS) / self.info (Python) unconditionally at the top, before the if (this._sessionId) guard. The info getter delegates to the sessionId getter, which throws when _sessionId is null. After the newly-added state reset in finally sets _sessionId to null, a follow-up stop() call from the caller's error path will throw "Session not started" instead of being the safe no-op described in the comment and PR description. The if guard that makes it a no-op is never reached.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 1c6b554. Configure here.

masnwilliams added a commit that referenced this pull request May 5, 2026
## Summary

Stacks on top of the unified CUA template branch. Addresses the new
High-Severity Cursor Bugbot finding on PR #143's latest scan: `stop()`
throws on repeated call instead of being a no-op.

## Background

PR #155 moved the session-state reset (`_sessionId = null`, …) into the
`finally` so that a thrown replay-stop or browser-delete error wouldn't
leave stale state behind. That fix was correct, but it exposed a latent
bug: `stop()` opens with `const info = this.info` (TS) / `info =
self.info` (Py), and the `info` getter delegates to the `sessionId`
property, which raises when `_sessionId` is null. So once PR #155
reliably cleared `_sessionId` on the first call, the caller's error-path
retry would hit the throwing getter and mask the original exception —
exactly the failure mode PR #155 was meant to prevent.

## Fix

`stop()` now:

1. Short-circuits at the top with a sentinel `SessionInfo` when no
session is active — never touches the throwing getter.
2. Builds the live-session `info` from local fields directly so the body
never depends on `this.info` / `self.info` either.

Symmetric in TS (`session.ts`) and Python (`session.py`).

## Test plan

- [x] `go build ./...`
- [x] `go test ./pkg/create/...`
- [ ] Force a replay-stop failure (e.g. delete the replay out-of-band,
or pass an invalid replay id) and confirm calling `session.stop()` twice
from the caller's error path no longer raises and the original error
surfaces.

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Touches session shutdown/cleanup logic in both Python and TypeScript
templates; while small, mistakes could lead to leaked browser sessions
or masked errors during teardown.
> 
> **Overview**
> `KernelBrowserSession.stop()` in both
`pkg/templates/python/cua/session.py` and
`pkg/templates/typescript/cua/session.ts` is updated to be
**idempotent**.
> 
> It now short-circuits when no active session exists (returning a
sentinel `SessionInfo` without reading `info`/`sessionId`), and builds
the returned `SessionInfo` directly from internal fields so teardown
can’t fail due to accessing a getter after state has been cleared.
> 
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit
db0bdea. Bugbot is set up for automated
code reviews on this repo. Configure
[here](https://www.cursor.com/dashboard/bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant